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Abstract 

Synechocystis sp. PCC 6803 is a widely used mode! cyanobacterium for studying photosynthesis, photo- 
taxis, the production of biofuels and many other aspects. Here we present a re-sequencing study of the 
genome and seven plasmids of one of the most widely used Synechocystis sp. PCC 6803 substrains, the 
glucose tolerant and motile Moscow or 'PCC-M' strain, revealing considerable evidence for recent micro- 
evolution. Seven single nucleotide polymorphisms (SNPs) specifically shared between 'PCC-M' and the 
'PCC-N and PCC-P' substrains indicate that 'PCC-M' belongs to the 'PCC group of motile strains. The iden- 
tified indels and SNPs in 'PCC-M' are likely to affect glucose tolerance, motility, phage resistance, certain 
stress responses as well as functions in the primary metabolism, potentially relevant for the synthesis of 
alkanes. Three SNPs in intergenic regions could affect the promoter activities of two protein-coding genes 
and one c/s-antisense RNA. Two deletions in 'PCC-M' affect parts of clustered regularly interspaced short 
palindrome repeats-associated spacer-repeat regions on plasmid pSYSA, in one case by an unusual recom- 
bination between spacer sequences. 
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1 . Introduction 

With currently >4000 publications available from 
PubMedCentral alone, 'Synechocystis' is the most 
widely used photoautotrophic prokaryotic model or- 
ganism. Synechocystis sp. PCC 6803 is a unicellular 
cyanobacterium that was isolated from a freshwater 
pond in Oakland, California.' The high popularity of 
Synechocystis sp. PCC 6803 stems from the two facts 
that it was the first phototrophic and the third organ- 
ism overall, for which a complete genome sequence 
was determined,^ and that it easily takes up exogen- 
ous DNA and integrates it into its chromosome by 
homologous recombination.^"^ 

Synechocystis sp. PCC6803 is known to occur in 
several distinct substrains, all going back to the same 
isolate deposited in the Pasteur Culture Collection.^ 
Indeed, several studies reported the differences 
between the genome sequence of Synechocystis sp. 
PCC 6803 published in 1996 (called here the 'GT- 



Kazusa' substrain) and the actual sequence found in 
different laboratories.^"' ° A strain history has been 
proposed by Ikeuchi and Tabata^ with an early 
branching into the motile PCC strain and the non- 
motile ATCC 27184 strain. The latter lost motility 
due to a 1-bp insertion in the spl<A gene coding for 
a eukaryotic-type Ser/Thr protein kinase" and repre- 
sents the origin of the glucose-tolerant (GT) strains^ 
to which also the 'GT-Kazusa' substrain belongs. 

For decades, Synechocystis sp. PCC 6803 has served 
as a simple model in photosynthesis research and to 
solve fundamental questions in microbial and plant 
physiology. More recently, cyanobacteria are increas- 
ingly being recognized as a promising resource for 
the production of biofuels such as hydrogen,'^ 
ethanol,'^ isobutyraldehyde and isobutanol,''^ ethyl- 
ene'^ and alkanes.'^ Synechocystis sp. PCC 6803 is 
being developed further as a model in these biotech- 
nology- and systems biology-oriented studies. These 
facts as well as the search for motility-associated 
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genes prompted several re-sequencing studies of 
Synechocystis sp. PCC 6803 substrains, namely of the 
substrains GT-S/° PCC-P, PCC-N, GT-l'' and YF.^^ 
However, these studies have not included the widely 
used GT and motile 'Moscow' substrain, which we 
here suggest to call 'PCC-M'. Furthermore, thus far 
no attention has been paid to the possible sequence 
variations in the seven plasmids, which constitute a 
total sequence length of 383 486 bp almost 1 0% of 
the total coding capacity of Synechocystis sp. PCC 
6803. This analysis provides new and reliable 
sequence data for the Synechocystis sp. PCC 6803 sub- 
strain 'PCC-M', revealing several differences from the 
published sequence that can be interpreted as 
the traces of microevolution during cultivation in 
the laboratory. 

2. Materials and methods 

2.1. Origin of strain, isolation of DNA and PCR 
analysis 

Synechocystis sp. PCC 6803 substrains 'Moscow' here 
called 'PCC-M, Kazusa (GT-Kazusa) and Vermaas' 
(GT-V) were cultivated by Prof. Annegret Wilde 
(University of Freiburg, Germany) and maintained as 
frozen stocks. The 'PCC-M' substrain was originally 
obtained from the laboratory of S. Shestakov 
(Moscow State University) in 1993 and over the 
years carefully propagated for motile colonies. The 
'GT-V strain originates from the laboratory of 
W. Vermaas (Arizona State University). Genomic DNA 
for deep sequencing analysis was isolated from 
80 ml cultures harvested on a glass microfiber filter 
(GF/C, 47 mm i.d. Whatman) by vacuum filtration. 
The frozen filter was ground in a mixer mill 
(Dismembrator MM301, Retsch, Germany) and the 
powder transferred into 1 ml SET buffer on ice (2 5% 
(w/v) sucrose, 1 mM EDTA, 50 mM Tris pH 7.5). 
One-fourth volume of 0.5 M EDTA, 2% SDS and 
1.5 mg proteinase K (Sigma) were added for cell 
lysis at 50°C overnight. Following phenol/chloroform 
extraction, one volume of 2-propanol (Roth, 
Germany) was added for precipitating the DNA at 
room temperature for 30 min. The precipitate was 
washed once in H20/2-propanol 1:1 and once in 
2-propanol, followed by 10 min centrifugation at 
10 000 g, 4°C. The pellet was washed with 70% 
EtOH, dried for 10 min and re-suspended in 50 |jlI 
H2O. One microlitre of RNase A (Sigma) was added 
and the tube incubated at 3 7°C and 260 rpm over- 
night. RNase was removed by another round of phen- 
olic extraction and precipitation as described above. 
The DNA was re-suspended in 75 |xl H2O, concentra- 
tion was measured photometrically and DNA quality 
checked on a gel (0.8% agarose). 



Genomic DNA for PCR was isolated from the cell 
pellet of 1 ml Synechocystis liquid culture. The pellet 
was washed once with a 1:10 dilution of TE buffer 
(1 0 mM Tris HCI pH 8; 1 mM EDTA) and re-suspended 
in 70 |xl of the same buffer. Cells were broken by incu- 
bation at 98°C for 1 0 min. After centrifugation at 
1 4 000 g and 4°C for 5 min, the supernatant was col- 
lected and kept on ice. Two microlitres of it were 
used for PCR. For PCR reactions, Phusion® DNA poly- 
merase (Finnzymes, New England Biolabs) was used 
according to the manufacturer's instructions. To 
verify single nucleotide polymorphisms (SNPs) 
between the different substrains, ~500 bp fragments 
containing the SNP position were amplified. PCR 
products were excised from an agarose gel, purified 
(illustra GFX PCR DNA and Gel Band Purification Kit, 
GE Healthcare) and sent for Sanger sequencing to 
GATC Biotech (Konstanz, Germany). For sequencing 
of the small plasmids, several PCR reactions were 
performed to get overlapping sequences and contigs 
were assembled using the software ContigExpress 
(Vector NT! Advance 11, Invitrogen). Alignments of 
the sequences were performed using AlignX (Vector 
NT! Advance 1 1 , Invitrogen). 

2.2. Sequencing methods and mapping 

Sequencing of genomic DNA was carried out on an 
lllumina Genome Analyzer llx system. Prior to sequen- 
cing, the DNA was sheared by ultrasonication (Covaris, 
Woburn, MA, USA), resulting in fragments of 300 bp 
length on average. For these fragments paired-end se- 
quencing according to the manufacturer's protocol 
was carried out, resulting in 42 143 495 million 
101 nt long reads. These reads were analysed with 
two methods in order to identify SNPs, deletions 
and insertions. For the first approach, we used the 
DNA sequence data assembler algorithm MIRA 
(Mimicking Intelligent Read Assembly)^ ^ to perform 
an assembly of the reads using the 'GT-Kazusa' 
genome as the reference. In the assembly process, 
MIRA generates tables of candidate SNPs, insertions 
and deletions. We verified these results independently 
by mapping all sequencing reads to the assembled 
chromosome and plasmid sequences. This was done 
using segemehl,^^ requiring at least 85% accuracy 
and reporting only the best hit. It should be noted 
that segemehl reports co-optimal best hits. 

3. Results 

3.1. Overview 

Sequencing of the Synechocystis sp. PCC 6803 
'Moscow' substrain 'PCC-M' by lllumina (Solexa) 
yielded an average 1 1 00-fold coverage of the 
chromosome and five of the seven plasmids. The 
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existence of the two remaining plasmids was verified 
individually by PCR. Following assembly of sequences, 
mapping to the reference strain sequences and anno- 
tation, the obtained genome and plasmid sequences 
were deposited in the GenBank database with the ac- 
cession numbers CP003265-CP003272. 

Altogether, we found 45 differences (36 SNPs and 9 
indels >1 bp) between the investigated substrain 
'PCC-M' and the published sequences of the 'GT- 
Kazusa' chromosome^ and plasmids^° used here as 
references (Table 1). From these differences, 41 are 
located in the chromosome and four in the plasmids 
pSYSA, pSYSM and pCA2.4. For verification, about 
one-third of these differences were randomly chosen 
and confirmed independently by PCR and Sanger 
sequencing of the respective regions in substrain 
'PCC-M', but no misidentified mutations were found. 
These DNA regions were, in addition, amplified and 
compared with the sequences from substrains 'GT- 
Kazusa' and 'GT-V for control and comparison, 
respectively. The GT 'GT-V was chosen for comparison 
as is widely used for the dissection and analysis of 
photosynthetic mutants. Fully segregated PSI, PSIl and 
Chi biosynthesis mutants were successfully generated 
in this genetic background^^'^^ and some of these 
mutants could not be obtained in other substrains.^^ 

The number of differences between 'PCC-M' and 
'GT-Kazusa' are almost twice as many as reported by 
Tajima et for the GT (GT-S) 'Kazusa' strain, 

where a total of 2 2 differences from the published se- 
quence were found.' ° All but 3 of those 22 differ- 
ences were also detected in the 'PCC-M' strain 
studied here. The three unique differences in the 
'GT-S' and 2 6 differences between 'PCC-M' and 'GT- 
Kazusa' underline the existence of lineage splitting in 
the Synechocystis substrains. Moreover, we found 
seven SNPs (#5, 1 3, 1 5, 1 6, 27, 32 and 33 in 
Tables 1 and 2) and one larger indel (#6 in Tables 1 
and 2) specifically shared between the 'PCC-M' and 
the 'PCC-N and PCC-P' substrains, indicating that 
'PCC-M' belongs to the 'PCC group of motile sub- 
strains.'' 'PCC-M and PCC-P' are strains that both 
exhibit the native positive phototaxis, whereas 'PCC- 
N' strain shows negative phototaxis.^"^ 

3.2. SNPs in protein-coding genes 

Of the total of 36 SNPs in 'PCC-M' compared with 
'GT-Kazusa', all except 1 are located in the chromo- 
some. The single base substitution that was found 
on the plasmid pCA2.4 within the repA gene (#42 in 
Table 1) seems to be no mutation but an error in 
the published sequence of 'GT-Kazusa', since in our 
PCR-control experiments, the sequence was identical 
in the three strains 'GT-Kazusa', 'PCC-M' and 'GT-V. 
Of the 3 5 chromosomal SNPs compared with 'GT- 



Kazusa', 5 are silent base substitutions, 14 substitu- 
tions lead to amino acid substitutions, in 6 cases a 
single basepair is deleted and in 2 cases (#2 3 and 
#28) one basepair was inserted within an ORF, 
causing a frameshift mutation. Furthermore, five sub- 
stitutions, two single basepair insertions and one 
single basepair deletion were observed in intergenic 
regions (IGR) of 'PCC-M' compared with the reference 
(Table 1). 

Seven SNPs are specifically shared between the 
'PCC-M', 'PCC-N and PCC-P' substrains. These are in 
slr1865 (#13), encoding a hypothetical protein, in 
sill 951 (#15), encoding a haemolysin-like protein, 
in slr1983 (#16), encoding a two-component 
hybrid sensor and regulator protein, in slr0222 
(#27), encoding the histidine kinase Hik25, a silent 
mutation in slr0302 (#32), encoding a PAS/PAC and 
GAF sensors-containing diguanylate cyclase, one 
missing basepair, leaving the spl<A gene intact (#5) 
and, finally, in ssrl 1 76 (#33), encoding a transposase 
(Tables 1 and 2). 

The gene for a cell surface- localized haemolysin-like 
protein, HlyA {sill 951), reported to function as a 
barrier against the adsorption of toxic com- 
pounds,^^' is lacking one nucleotide in 'PCC-M' 
compared with the reference (difference #1 5). In 
the 'GT-Kazusa', 'GT-V as well as the 'GT-I' and 'GT-S' 
strains,^ the presence of the additional A leads to 
the fusion of two ORFs that are separate in 'PCC-M', 
'PCC-N' and 'PCC-P' substrains.'' As a result. Sill 951 
is 1741 amino acids in the former and only 1437 
residues in the latter. 

In our data, some other previously published muta- 
tions^'' ° are confirmed. For instance, spl<A {sill 574; 
#5), a regulator of cellular motility via phosphoryl- 
ation of membrane proteins,"'^^ is disrupted by a 
1-bp insertion in the non-motile 'GT-Kazusa' and 
'GT-V strains, whereas it is intact in the motile 'PCC- 
M' strain (Table 1). Similarly, the pilC gene 
{sIrOl 62 /3) required for pili assembly has been 
reported to carry a frameshift mutation in the 'GT- 
Kazusa' and 'GT-S' sequences.^'' °'^^ We found an 
intact pile gene in 'PCC-M' (#20), as well as in the 
'GT-V substrain. 

Another SNP (G-A) exists in psaA {sirl 834; #9), en- 
coding the photosynthetic P700 apoprotein subunit 
la; however, in accordance with Tajima et fl/.'°, we 
believe this is an annotation error in the database as 
we found an A in the respective position in all three 
strains dealt with in this work (Table 1). Similarly, 
ycf22 {sll0751; #26) is here suggested to be fused 
to the downstream reading frame sll0752. Indeed, 
in blastp comparisons, both proteins together match 
against a single, widely distributed, larger protein of 
452 amino acids. This protein possesses a Ttg2C 
domain (COG1463), which is found in an ABC-type 
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^ ^ transport system involved in resistance to organic 

solvents. The acronym ycf stands for hypothetical 



g ^ chloroplast reading /rames, meaning proteins 

t Qj « conserved in chloroplasts and also cyanobacteria. 
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^ N ti ^ -S port protein involved in maintaining the chloride ion 

^ S concentration homoeostasis as required for a func- 

^ u o ^ c tional photosystem 1!.^'^ 

^ ro ?j A single basepair deletion in sll1496 (#38 in 

J5 ^ [E Q. 2 Table 1), encoding mannose-1 -phosphate guanyl- 



^ o .y ^ transferase, causes a frameshift and premature stop 

^ ° c So 3 °^ gene in 'PCC-M'. The resulting protein is with 

QJ -§ S "5 515 instead of 643 amino acids severely truncated 

H Q. 8 .E - and may be rendered function-less. 
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Table 2. Comparison of SNPs and indels found in tfie chromosome of 'PCC-M' with sequences from other substrains 



Event 




Comparison 


of strains: literature + this work 








# 


Event 


GT-Kazusa'''"' GT-S'" 


GT-I' 


PCC-P^ 


PCC-N^ 


PCC-M 


1 
2 


s 
1 


— 
— 


— 
— 




J 

V 


J 

V 


/ 

V 

J 


3 


s 


— 


— 








J 

V 


4 


D 




— 









V 


5 


D 


— 


— 


— 


V 


V 


,/ 


6 


I 


— 


— 


— 


J 

V 




J 

V 


7 


S 


— 


— 


— 






V 


8 


s 


— 


— 


— 






J 

V 


9 


s 








V 




V 


1 0 


s 








J 

V 


V 


V 


1 1 


s 


— 


— 








J 

V 


1 2 


D 


— 


V 


V 


V 


V 


V 


1 3 


S 


— 


— 




V 


J 

V 


J 

V 


14 


s 


V 


V 


V 


J 

V 




V 


1 5 


D 


— 


— 




V 


V 


V 


1 6 


S 


— 


— 


— 


J 

V 


J 

V 


J 

V 


1 7 


D 


— 


— 


V 


V 




V 


1 8 


s 






V 


,/ 


/ 

V 


,/ 

V 


1 9 


s 


V 


V 


V 


J 

V 




J 

V 


20 


D 


— 


V 


V 


V 


V 


V 


21 


S 


V 


V 


V 


J 

V 


V 


J 

V 


22 


1 


V 


V 


V 


V 




V 


23 


1 


V 


V 


V 


V 


V 


J 

V 


24 


S 


— 


— 








V 


25 


D 






V 


J 

V 


V 


V 


26 


D 


V 




V 


V 


J 

V 


J 

V 


27 


S 


— 


— 




J 

V 


V 


V 


28 


1 


V 


V 


V 


/ 

V 


/ 


/ 

V 


29 


S 


V 


V 


V 


V 


J 

V 


J 

V 


30 


s 


V 


V 


V 


J 


V 


V 


31 


s 


V 


V 


V 


V 




J 

V 


32 


s 


— 


— 




,/ 


/ 

V 


V 


33 


s 


— 


— 


— 


V 


/ 

V 


J 

V 


34 


s 













V 


3 5 


5 




V 




/ 

V 


/ 

V 


/ 


36 


1 












V 


37 


D 




V 


V 


V 


V 


V 


38 


D 












V 


39 


S 












V 


40 


D 






V 




V 


V 


41 


S 












V 



All events are numbered (column #) as in Table 1 . The presence of the respective 'PCC-M' mutation in the different sub- 
strains is indicated by the check marks. 

^The deletion of 0.6 kb in the gene sirl 753 compared with the reference was also verified here in 'GT-Kazusa'. 
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3.3. Point mutations in ICRs 

Compared with tfie reference, eight SNPs are 
located in ICRs, three of these (#7, 24 and 36) are 
'PCC-M' specific. One of these (#36 in Table 1) SNPs 
is predicted to affect one of the recently reported 
c/s-antisense RNAs.^^ The additional A between posi- 
tions 31 94022 and 3194023 is located in the ICR 
between genes slr0533 and slr0534, encoding histi- 
dine kinase 1 0 (Hikl 0) and the soluble lytic transgly- 
cosylase Sit. On the reverse strand, the additional T 
falls within the predicted -10 element of the 
slr0534_as3 promoter. Instead of the high-scoring 
CATAAT,^^ the motif is changed to ATTAAT. Hence, a 
modulation of slr0534_as3 expression compared 
with the reference is possible. In contrast to its desig- 
nation, this ds-antisense RNA overlaps the 3' end of 
genes slr0533 and iiil<10 (due to an error in the an- 
notation used as the reference). In microarray ana- 
lyses, slr0534_as3 of strain 'PCC-M' was found to be 
moderately to highly expressed under four tested con- 
ditions. Compared with the accumulation of the 
hil<10 mRNA, it appeared even stronger.^^ A function 
for Hikl 0 has been found in the perception of salt 
stress or transduction of the signal. The 
slr0534_as3 transcript may play a silencing role 
with regard to liil<10 under non-inducing conditions. 
Mutation of its promoter element may hence cause 
a physiological effect in the salt stress response. 

Two other SNPs (at positions 831 647 and 
2 400 722; #7 and #24 in Table 1) could have an 
impact on the promoter strength or the regulation 
of the genes infA and glcP. For glcP, the initiation site 
of transcription was mapped to position 
2 400 666^^ and for infA to position 831 635 (un- 
published). Thus, these two SNPs are located 1 2 and 
56 nt upstream of the respective initiation site of 
transcription. In the case of the infA promoter, the 
transition replaces a nucleotide within the putative 
- 1 0 element, changing it from TGTGAT to TATGAT, a 
much more typical motif for a -10 element in 
Synechocystis.^^ The mutation 56 nt upstream of the 
initiation site of transcription of glcP might be func- 
tionally relevant as well. The gene product, a glucose 
transporter, is directly relevant for the physiological 
ability to use glucose; its gene expression is affected 
by mutation of the gene for the AbrB-type transcrip- 
tion factor Sll082 2.^^ The region at position -56 
might well be part of the recognized sequence. 

3.4. Larger indels and plasmids 

In addition to this relatively large number of SNPs, 
only seven larger deletions were found on the 
chromosome and two plasmids. Compared with the 
reference, a deletion of 0.6 kb exists in the gene 
sir1753 (#4 in Table 1), which encodes, according 



to our data, a giant protein comprising 1 549 amino 
acids that probably is transported to the cell surface. 
However, we found this deletion in our verification 
also in 'GT-Kazusa' and 'GT-V. Moreover, the 
deleted/inserted region consists of long series of 
DNA repeats (Fig. 1), an evidence for a possible as- 
sembly or annotation error in the original sequence 
analysis. 

Given the very scarce available information con- 
cerning biological functions of the plasmids in 
Syneciiocystis sp. PCC 6803, it was interesting that all 
seven plasmids were detected during our analysis. 
Two, pCC5.2 and pCB2.4, were initially not found. 
However, as they were amplified easily by PCR, we 
re-inspected the unmapped sequencing reads, but 
still could not detect a single read matching these 
plasmids. This observation may relate to a lower 
copy number of these compared with the other plas- 
mids, but this was not tested in the current study. 
Analysing the plasmid sequences, we observed a re- 
markable genetic stability. In addition to a single- 
base substitution in the plasmid pCA2.4 that might 
rather constitute an error in the reference sequence^^ 
(see above) and a missing mobile element on the 
plasmid pSYSM, two mutations were observed, both 
in the plasmid pSYSA. 

Two major mutations affect the clustered regularly 
interspaced short palindrome repeats-CRISPR-asso- 
ciated proteins (CRISPR-Cas) system, located on the 
plasmid pSYSA. CRISPR-Cas systems provide in many 
archaea and bacteria an adaptive immunity against 
invading DNA.^*'"'^'^ The plasmid pSYSA encodes the 
three independent systems CRISPRl, CRISPR2 and 
CRISPR3. A 2399-bp deletion encompassing the 
spacer-repeat regions 15-47 of CRISPRl was 
detected in 'PCC-M' (#43), which also eliminated 
the relatively short genes ssrZOI 8, ssl7019, ssl7020 
and ssl7021, annotated within the spacer-repeat 
array of CRISPRl. However, the theoretical protein 
sequences of these gene products show no conserva- 
tion at all and might not constitute real genes. 
Nevertheless, the deletion of spacer-repeat regions 
15-47 of CRISPRl is severe, since compared with 
the reference, it has eliminated two-thirds, 33 of its 
49 spacer-repeat units. The sequence analysis sug- 
gests that the recombination events leading to the 
deletion of spacer-repeat regions 1 5-47 must have 
occurred within the direct repeats. Thus, this recom- 
bination is in agreement with previous observations 
that the downstream ends of the repeat clusters are 
conserved such that deletions and recombination 
events occur internally. 

Avery different type of deletion was noticed for the 
CRISPR2 system located on the same plasmid. In this 
case, 1 59 bp were deleted (event #44 in Table 1 ). 
These 1 59 deleted bases correspond to positions 
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Contig PCR 
slrl753 reference 



1 100 

TTGGTCGRAGTGRACCTTTCATCATCGGCAATCCaATOftCCAATGGAAOCATrGGGGCAATCAGrTOGQGAGA'IGACAATACTGTA 

TQTGTA:ATrqvA1TTQGTOGftA![3(^0CTITCATCATCGGaVAICCAATIA:CAAIGGAA0CATIQSG0C 



101 159 200 

Contig PCR OCAATCGGTACACCAOCAATTTTTGATACTTTTACOCTQGACAATATAACCATAACCA 

3 Ir 17 5 3 re f erence OCAATCGGraCACCAOCAATTTTTGATACTTTTACCCTGGACAATArAACCATAAarACTGAACCOSAGCOGGAAOCTGAflCOCGSGOCGGRACCTGAGC 



Contig PCR 
slrl753 reference 



201 



300 



OGGAAOCTGAACCOGSGCCGGAAOCTGAGCOGGAAOCTGASCCCGAGCCGGASCCTCAACOCGAGOCGSAACCTGAAOCCGAGCCGGAACCTGRACCCGA 



[...] 



Contig PCR 
slrl7S3 reference 



Contig PCR 
slrl753 reference 



701 759 800 
CTGAACCOGAGCCGGRAOCTGAGCCTGAGCTTOCAACTCCTT 

GftACCCGAOCCGGMOrTGAGCCGGflACCTGAACCCGAGCCGGRACCTGAOCOGGAACCTGftflCCCGAGCOGGAAOCTGACXrCrGAGCrTCCAACTCCTT 



801 900 

CCATACCCGTOGAACCrCCAACTTCTCCAGATGTTGAGGftAATTGArGATGAGAGGCAAATATCGTTTCATATTGAGaSTTCTACTGTTAOCAATACOCC 

CCAmCCCGTOGAACCTCCAACTTCTCCASATGTTGAGGAAATTGArGATGAGftGGCAAAmTCGTrTGATATrGAGCGTTCTACTGTTAOCMTACCCC 



Contig PCR 
slrl7S3 reference 



[...] 



1101 1200 

AGGAATTAflATCTCASCACTGGGCTAAATTCTGCCTTGGTTTATATCTATTTCTATCCTCCAGGAACTSCAGGAGAAAATCTTCCTGAAT 

AGGMTTAAATCTCAfiCACTQGGGTAAATrCTGOCTrGGTTTATATCTATTTCTATCCTCCAQGAACTGCflGGAGAAAATCTTCCTCAATOSCAACT 



1201 



ia2 



Contig PCR 

slrl753 reference CTClGAIAGrGA 



=» CTGAAOCCGAGCCGGAAC 
CTGAGOCGGAAC 



Figure 1. Alignment of the possible indel region in gene s/r/ 753. The sequence obtained in the verification experiment is aligned with that 
of the 'GT-Kazusa' reference. Two types of DNA repeats are indicated by the filled and non-filled lozenges. 



71 499-71 657 in the reference. The deletion 
encompasses two repeats including the spacer 41 in 
between. It is very surprising that the recombination 
did not occur within the repeat sections but in the ad- 
jacent spacers 40 and 42, thus generating a new 
'hybrid' spacer 40 at positions 69 082-69 1 1 1 in 
the pSYSA plasmid of 'PCC-M' (Fig. 2). As a result, 
spacers 40, 41 and 42 of the original sequence are 
missing and became replaced by this hybrid se- 
quence. The vast majority of described deletions in 
the CRISPR system occur between the direct 
repeats.'^^ Non-homologous recombination between 
two different spacers is rare, the deletion observed 
here in CRISPR2 of the plasmid pSYSA is generating 
additional sequence diversity in the CRISPR system. 
Due to the two deletions in the plasmid pSYSA, we 
determined its total length as 1 00 749 bp, compared 
with 103 307 bp for the reference. 



3.5. Mobile elements 

As can be seen in Tables 1 and 2 (differences #1 2, 
1 7, 40 and 45), the 'PCC-M' substrain lacks four inser- 



Kazusa'.'' These elements are ISY203b, e and g on 
the chromosome and ISY2 03j on the plasmid 
pSYSM. These four indels have the exact same size of 
1 1 83 bp, only one is 1 1 85 bp. 

In the 'GT-S' substrain re-sequenced by Tajima 
et al.^° one of these four elements, ISY203e, is 
already present, placing this strain (in accordance 
with Ikeuchi and Tabata)^ before 'GT-Kazusa' in the 
strain history. The absence of ISY203b, e and g in 
'PCC-M' is further shared with the strains 'GT-I', 
'PCC-N' and 'PCC-P',^ whereas no statement is pos- 
sible with regard to the possible presence of ISY203j 
on the plasmid pSYSM in the latter. 

With respect to the described mobile elements, 
'PCC-M' appears as one of the least-derived substrains. 



4. Discussion 

4.1 . Strain history 

'PCC-M' shows sequence differences in several 
genes compared with the reference sequence of 'GT- 
Kazusa' and also to the recently sequenced 'GT-S' 



tion elements of the ISY203 type present in 'GT- strain. Kanesaki et a/, concluded that 1 5 differences 
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GT-Kazusa 



Spacer 



Spacer 



Spacer 



5' 




40 




41 




42 












■n, 
«>■ 



PCC-M 



159 bp Deletion 



5' 












Hybrid 





Spacer 

40 

GGCGATATTTGGCGTGCCGCTCAGCCTT^i 

Figure 2. Non-homologous recombination in the plasmid pSYSA 
affecting spacers 40, 41 and 42 of CRISPR2. As a result of the 
1 59-bp deletion in 'PCC-M' compared with 'GT-Kazusa', a 
novel hybrid spacer 40 was generated. The direct repeats are 
presented as squares and the nucleotide positions in the 'GT- 
Kazusa' are given according to the GenBank file NC_005230. 



between the resequenced strains and the published 
GT-Kazusa sequence were annotation errors in the 
latter due to sequencing artefacts, a list to which we 
add two more putative errors in the database, differ- 
ences #4 and #42 in Table 1. According to the pro- 
posed strain history in Ikeuchi and Tabata,^ the early 
division of Synechocystis sp. PCC 6803 into two 
branches occurred due to an insertion in spl<A. Thus, 
our data suggest that the motile 'PCC-M' strain 
belongs to the motile PCC 6803 branch, whereas 
the non-motile 'GT-Kazusa', 'GT-S' and 'GT-V strains 
are more closely related to each other and belong to 
theATCC2 7 184 branch. However, the 1 -bp insertion 
in the pilC leading to 'GT-Kazusa' as described in the 
proposed strain history^ is not present in either 'GT- 
S' or 'GT-V, characterizing 'GT-Kazusa' as a more 
derived substrain. 

That 'PCC-M' belongs to the motile PCC 6803 
branch is further reinforced by our finding of six 
SNPs specifically shared between the 'PCC-M' and 
the 'PCC-N and PCC-P' substrains (Tables 1 and 2)."' 
These six SNPs are in slr1865, in sill 951, encoding 
a haemolysin-like protein, in ssrl176, encoding a 
transposase and, interestingly, in genes encoding 
sensor and/or regulatory proteins (slr1983, slr0222 
and slr0302) (Tables 1 and 2) and must already 
have been present in the progenitor strain to 'PCC- 
M', 'PCC-N' and 'PCC-P'. Additional support comes 
from the analysis of two larger indels (#2 and #6 in 
Table 1). The preceding paper, Kanesaki et al.,^ 
described difficulties in finding indels between direct 
repeat sequences such as slrl084 and slr2031 by 
short read type re-sequencing data. Therefore, these 
two regions were analysed by PCR and Sanger sequen- 
cing in addition to the re-sequencing analysis. Indeed, 
the finding of indels between direct repeat sequences 
in genes sirl 084 and slr203 1 turned out as not been 
straightforward in our analysis as well. Compared with 



the reference, we found in both cases the additional 
sequences of 1 02 and 1 54 bp to be present in 'PCC- 
M'. This result is relevant for lineage relationships 
among substrains. The additional 1 02 bp in gene 
sirl 084 are shared between 'PCC-M' and the other 
substrains 'PCC-P', 'PCC-N' and 'GT-I'. Therefore, this 
must be a deletion in the lineage leading to GT- 
Kazusa and GT-S. In contrast, the additional 1 54 bp 
within and upstream of gene slr2031 are shared 
between 'PCC-M', 'PCC-P' and 'PCC-N' and are 
absent from all studied GT substrains. These 1 54 bp 
comprise the conserved start codon of slr2031 and 
extend the gene by 29 codons compared with 'GT- 
Kazusa'. Hence, the lack of these 1 54 bp in GT 
strains indicate a functionally adverse deletion there. 
In fact, the 1 54-bp deletion in GT substrains was 
noticed before,'^^ as well as the activity of slr203 1 in 
the original Synechocystis sp. PCC 6803 substrains."^^ 
From these considerations, the tree shown in Fig. 3 
can be derived. In this tree, 'GT-Kazusa' is displayed 
as the strain with the longest evolutionary distance 
from the original isolate, whereas the 'PCC-M' sub- 
strain belongs to the 'PCC group of substrains and is 
probably close to the original characteristics. All 
strains belonging to the 'PCC group of substrains 
exhibit twitching motility as was shown also for the 
original PCC strain deposited in the Pasteur Culture 
Collection^ with variations in the motility behav- 
jQ^|.48,49 5jp,(;g 'PCC-M' shows motility and is tolerant 
to glucose, it appears physiologically as a sort of inter- 
mediate between the two major branches: the motile 
and GT branches, consistent with its characterization 
as being close to the original characteristics. 



4.2. Re-sequencing studies o/ Synechocystis sp. 
PCC 6803 

The analysis of genome sequences of cyanobacteria 
has had a large impact on photosynthesis, ecology 
and biotechnology research. ^° The present re-sequen- 
cing project delivers the new and complete sequence 
of the Synechocystis sp. PCC 6803 'PCC-M', a substrain 
used in many laboratories and in several aspects close 
to the original isolate. Altogether, there are now 
chromosomal sequences for seven substrains of 
Synechocystis sp. PCC 6803 available: 'PCC-M' (this 
study); 'PCC-P' (positive phototaxis) and 'PCC-N' 
(negative phototaxis), both based on single colonies 
isolated from the PCC strain and designated according 
to their direction of phototactic movement;^'^ 'GT-I', 
the standard strain in Dr Ikeuchi's group;^ 'YF''^ and 
'GT-S',^° a current derivative of the original stock of 
Synechocystis sp. PCC 6803 from which the chromo- 
somal reference sequence for 'GT-Kazusa' was deter- 
mined in 1996^ and for the large plasmids in 
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motile 



PCC-N 



12bp Del. SII0698 (hik33) 

Ibp Ins. slr0079 (gspE) 

SNPslr1119 

SNPslr1510(plsX) 

SNPslr1962 

SNP slrO370 (gabD) 



PCC-P 




PCC-M 



SNP SII0698 (hik33) 
SNPslr1992 
SNP slr0645 



IbpDel. SII1496 
SNPslr1609 
SNPslr1898 (argB) 
SNPsll1359 
SNP slr0753 



45bp Del. slr1819 

1bp Ins. SII0182 

SNPslr1993 

SNP SII0698 (hik33) 

SNPssr1175 



Ibp Del. SI11951 (HlyA) 



CO 

o 
00 
<o 
O 
O 
CL 



SNPslr1865 
SNPslr1983 
SNP slr0222 (hik25) 
SNP slr0302 (PleD) 
SNPssr1176 



CO 

o 

CO 
CD 

>- 

LU 

_l 
LU 

tr 

LU 

m 



GT-Kazusa 



1bp Ins. slr0162 (pilC) 
Indel IGR sll0529/28 
entry ISY203b 

entry ISY203g into sll1473/75 (ccaS) 



GT-S 



entry ISY203e 
102bp Del. slr1084 



GT-I 




SNPslr1085 
SNP sin 799 (rpIC) 
SNPsll1968(pmgA) 
SNPslr1250 (pstB) 
SNP sin 605 (fabZ) 
SNPslr1962 



1bp Ins. sin 574/75 (spkA) 
154bp Del. upstream and 
_ within slr2031 



R. Kunisawa, 1968 
Isolation from fresh water, Oakland, California 



Figure 3. Visualization of phylogenetic relationships between various strains of Synechocystis sp. PCC 6803. The occurrence of the identified 
SNPs and other known events are indicated along the branches. The eight events separating the 'GT' and 'PCC strains from each other 
are given at the branch point where these two lineages split or on the respective branches where they occurred. Putative insertions and 
deletions are labelled 'Ins', and 'Del'., respectively. 



2003,^° whereas the three small plasmids had been 
sequenced already before.^^'^''^^ 



4.3. Mutations potentially linked to phenotype 

It is likely that most of the identified differences 
between the sequenced substrains result from distinct 



differences in the cultivation conditions in the differ- 
ent laboratories that have selected for fixing one or 
the other mutation. That also implies that the major- 
ity of identified mutations are not silent but linked to 
a certain effect. Indeed, most mutations in coding 
regions are not silent as might be expected but 
lead to frameshifts, amino acid substitutions or the 
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truncation of reading frames. Similarly, SNPs in non- 
coding regions are probably biologically meaningful, 
too. This idea received support here by linking three 
'PCC-M'-specific SNPs in ICRs to the promoter 
regions controlling the expression of two protein- 
coding and one antisense RNA. 

For all these reasons, it appears likely that several of 
the mutations specific to 'PCC-M' or shared with 'PCC- 
P' and 'PCC-N' may be related to the known pheno- 
types of these strains. For example, the truncation of 
sill 951 (haemolysin) and possible truncation of 
sirl 753 (surface protein) may contribute to a stress- 
induced clumping phenotype. Several other muta- 
tions might cause alterations in glucose tolerance 
or phototactic behaviour of these substrains. 
Differences at other loci may affect the phage resist- 
ance, stress response or functions in the primary 
metabolism, potentially relevant for the synthesis of 
alkanes or the N and C metabolism. The absence of 
ISY203g in the sll1473-5 regions in PCC substrains 
leads to an intact photoreceptor that regulates the 
expression of an alternative phycobilisome linker 
gene.^^ Regarding phenotypic differences among 
motile PCC substrains, it might be noteworthy that 
'PCC-M', despite its general ability to be motile, is 
not phototactic towards blue light (see direct com- 
parison of strains in Fig. 1 of Fiedler et al^^). Here, 
the SNP #39 in the sigF gene, known to be involved 
in the control of phototactic movement^" might be 
considered, as the resulting M231K substitution 
could influence the DNA-protein interaction of this 
group 3 sigma factor in a very subtle way. For sure, 
the subtle differences in genome sequences have to 
be considered when choosing a particular substrain 
for certain experiments and when comparing pheno- 
types of mutant lines from different laboratories with 
the wild-type strain. Information on the re-sequenced 
genome and plasmid sequences including precisely 
annotated SNPs can be found in the eight sequence 
files available from GenBank under the accession 
numbers CP003265-CP003272. 
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